智能论文笔记

Which anonymization technique is best for which NLP task? -- It depends. A Systematic Study on Clinical Text Processing

Iyadh Ben Cheikh Larbi , Aljoscha Burchardt , Roland Roller

分类：自然语言处理

2022-09-01

近年来，临床文本处理引起了越来越多的关注。另一方面，访问敏感的患者数据仍然是一个巨大的挑战，因为如果没有法律障碍，就无法共享文本，而无需删除个人信息。有许多技术可以修改或删除与患者相关的信息，每种信息都具有不同的优势。本文使用对应于五个不同NLP任务的多个数据集研究了不同匿名技术对ML模型性能的影响。提出了一些学习和建议。这项工作证实，特别强大的匿名技术导致了大量的性能下降。除此之外，大多数提出的技术并不是基于相似性搜索的重新识别攻击的安全性。

translated by 谷歌翻译

HTML版本

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

Lisa Raithel , Philippe Thomas , Roland Roller , Oliver Sapina , Sebastian Möller , Pierre Zweigenbaum

分类：自然语言处理 | 机器学习

2022-08-03

在这项工作中，我们介绍了患者生成的含量中第一个用于德国不良药物反应（ADR）检测的语料库。该数据包括来自德国患者论坛的4,169个二进制注释的文档，用户谈论健康问题并从医生那里获得建议。正如该领域的社交媒体数据中常见的那样，语料库的类标签非常不平衡。这一主题不平衡使其成为一个非常具有挑战性的数据集，因为通常相同的症状可能会有几种原因，并且并不总是与药物摄入有关。我们旨在鼓励在ADR检测领域进行进一步的多语性努力，并使用基于多语言模型的零和少数学习方法为二进制分类提供初步实验。当对XLM-Roberta进行微调首先在英语患者论坛数据上，然后在新的德国数据上进行微调时，我们的正面级别的F1得分为37.52。我们使数据集和模型公开可供社区使用。

translated by 谷歌翻译

A Medical Information Extraction Workbench to Process German Clinical Text

Roland Roller , Laura Seiffe , Ammer Ayach , Sebastian Möller , Oliver Marten , Michael Mikhailov , Christoph Alt , Danilo Schmidt , Fabian Halleck , Marcel Naik

分类：自然语言处理

2022-07-08

背景：在信息提取和自然语言处理域中，可访问的数据集对于复制和比较结果至关重要。公开可用的实施和工具可以用作基准，并促进更复杂的应用程序的开发。但是，在临床文本处理的背景下，可访问数据集的数量很少 - 现有工具的数量也很少。主要原因之一是数据的敏感性。对于非英语语言，这个问题更为明显。方法：为了解决这种情况，我们介绍了一个工作台：德国临床文本处理模型的集合。这些模型接受了德国肾脏病报告的识别语料库的培训。结果：提出的模型为内域数据提供了有希望的结果。此外，我们表明我们的模型也可以成功应用于德语的其他生物医学文本。我们的工作台公开可用，因此可以开箱即用，或转移到相关问题上。

translated by 谷歌翻译

Efficient aggregation of face embeddings for decentralized face recognition deployments (extended version)

Philipp Hofer , Michael Roland , Philipp Schwarz , Renè Mayrhofer

分类：人工智能 | 计算机视觉

2022-12-20

Biometrics are one of the most privacy-sensitive data. Ubiquitous authentication systems with a focus on privacy favor decentralized approaches as they reduce potential attack vectors, both on a technical and organizational level. The gold standard is to let the user be in control of where their own data is stored, which consequently leads to a high variety of devices used. Moreover, in comparison with a centralized system, designs with higher end-user freedom often incur additional network overhead. Therefore, when using face recognition for biometric authentication, an efficient way to compare faces is important in practical deployments, because it reduces both network and hardware requirements that are essential to encourage device diversity. This paper proposes an efficient way to aggregate embeddings used for face recognition based on an extensive analysis on different datasets and the use of different aggregation strategies. As part of this analysis, a new dataset has been collected, which is available for research purposes. Our proposed method supports the construction of massively scalable, decentralized face recognition systems with a focus on both privacy and long-term usability.

translated by 谷歌翻译

Fast Converging Anytime Model Counting

Yong Lai , Kuldeep S. Meel , Roland H. C. Yap

分类：人工智能

2022-12-19

Model counting is a fundamental problem which has been influential in many applications, from artificial intelligence to formal verification. Due to the intrinsic hardness of model counting, approximate techniques have been developed to solve real-world instances of model counting. This paper designs a new anytime approach called PartialKC for approximate model counting. The idea is a form of partial knowledge compilation to provide an unbiased estimate of the model count which can converge to the exact count. Our empirical analysis demonstrates that PartialKC achieves significant scalability and accuracy over prior state-of-the-art approximate counters, including satss and STS. Interestingly, the empirical results show that PartialKC reaches convergence for many instances and therefore provides exact model counting performance comparable to state-of-the-art exact counters.

translated by 谷歌翻译

An annotated instance segmentation XXL-CT dataset from a historic airplane

Roland Gruber , Nils Reims , Andreas Hempfer , Stefan Gerth , Michael Salamon , Thomas Wittenberg

分类：计算机视觉

2022-12-16

The Me 163 was a Second World War fighter airplane and a result of the German air force secret developments. One of these airplanes is currently owned and displayed in the historic aircraft exhibition of the Deutsches Museum in Munich, Germany. To gain insights with respect to its history, design and state of preservation, a complete CT scan was obtained using an industrial XXL-computer tomography scanner. Using the CT data from the Me 163, all its details can visually be examined at various levels, ranging from the complete hull down to single sprockets and rivets. However, while a trained human observer can identify and interpret the volumetric data with all its parts and connections, a virtual dissection of the airplane and all its different parts would be quite desirable. Nevertheless, this means, that an instance segmentation of all components and objects of interest into disjoint entities from the CT data is necessary. As of currently, no adequate computer-assisted tools for automated or semi-automated segmentation of such XXL-airplane data are available, in a first step, an interactive data annotation and object labeling process has been established. So far, seven 512 x 512 x 512 voxel sub-volumes from the Me 163 airplane have been annotated and labeled, whose results can potentially be used for various new applications in the field of digital heritage, non-destructive testing, or machine-learning. This work describes the data acquisition process of the airplane using an industrial XXL-CT scanner, outlines the interactive segmentation and labeling scheme to annotate sub-volumes of the airplane's CT data, describes and discusses various challenges with respect to interpreting and handling the annotated and labeled data.

translated by 谷歌翻译

Resilient Terrain Navigation with a 5 DOF Metal Detector Drone

Patrick Pfreundschuh , Rik Bahnemann , Tim Kazik , Thomas Mantel , Roland Siegwart , Olov Andersson

分类：机器人

2022-12-14

Micro aerial vehicles (MAVs) hold the potential for performing autonomous and contactless land surveys for the detection of landmines and explosive remnants of war (ERW). Metal detectors are the standard tool, but have to be operated close to and parallel to the terrain. As this requires advanced flight capabilities, they have not been successfully combined with MAVs before. To this end, we present a full system to autonomously survey challenging undulated terrain using a metal detector mounted on a 5 degrees of freedom (DOF) MAV. Based on an online estimate of the terrain, our receding-horizon planner efficiently covers the area, aligning the detector to the surface while considering the kinematic and visibility constraints of the platform. For resilient localization, we propose a factor-graph approach for online fusion of GNSS, IMU and LiDAR measurements. A simulated ablation study shows that the proposed planner reduces coverage duration and improves trajectory smoothness. Real-world flight experiments showcase autonomous mapping of buried metallic objects in undulated and obstructed terrain. The proposed localization approach is resilient to individual sensor degeneracy.

translated by 谷歌翻译

A Multi-Segment, Soft Growing Robot with Selective Steering

Alexander M. Kübler , Sebastián Urdaneta Rivera , Frances B. Raphael , Julian Förster , Roland Siegwart , Allison M. Okamura

分类：机器人

2022-12-07

Everting, soft growing vine robots benefit from reduced friction with their environment, which allows them to navigate challenging terrain. Vine robots can use air pouches attached to their sides for lateral steering. However, when all pouches are serially connected, the whole robot can only perform one constant curvature in free space. It must contact the environment to navigate through obstacles along paths with multiple turns. This work presents a multi-segment vine robot that can navigate complex paths without interacting with its environment. This is achieved by a new steering method that selectively actuates each single pouch at the tip, providing high degrees of freedom with few control inputs. A small magnetic valve connects each pouch to a pressure supply line. A motorized tip mount uses an interlocking mechanism and motorized rollers on the outer material of the vine robot. As each valve passes through the tip mount, a permanent magnet inside the tip mount opens the valve so the corresponding pouch is connected to the pressure supply line at the same moment. Novel cylindrical pneumatic artificial muscles (cPAMs) are integrated into the vine robot and inflate to a cylindrical shape for improved bending characteristics compared to other state-of-the art vine robots. The motorized tip mount controls a continuous eversion speed and enables controlled retraction. A final prototype was able to repeatably grow into different shapes and hold these shapes. We predict the path using a model that assumes a piecewise constant curvature along the outside of the multi-segment vine robot. The proposed multi-segment steering method can be extended to other soft continuum robot designs.

translated by 谷歌翻译

maplab 2.0 -- A Modular and Multi-Modal Mapping Framework

Andrei Cramariuc , Lukas Bernreiter , Florian Tschopp , Marius Fehr , Victor Reijgwart , Juan Nieto , Roland Siegwart , Cesar Cadena

分类：机器人

2022-12-01

Integration of multiple sensor modalities and deep learning into Simultaneous Localization And Mapping (SLAM) systems are areas of significant interest in current research. Multi-modality is a stepping stone towards achieving robustness in challenging environments and interoperability of heterogeneous multi-robot systems with varying sensor setups. With maplab 2.0, we provide a versatile open-source platform that facilitates developing, testing, and integrating new modules and features into a fully-fledged SLAM system. Through extensive experiments, we show that maplab 2.0's accuracy is comparable to the state-of-the-art on the HILTI 2021 benchmark. Additionally, we showcase the flexibility of our system with three use cases: i) large-scale (approx. 10 km) multi-robot multi-session (23 missions) mapping, ii) integration of non-visual landmarks, and iii) incorporating a semantic object-based loop closure module into the mapping framework. The code is available open-source at https://github.com/ethz-asl/maplab.

translated by 谷歌翻译

Applying Deep Reinforcement Learning to the HP Model for Protein Structure Prediction

Kaiyuan Yang , Houjing Huang , Olafs Vandans , Adithya Murali , Fujia Tian , Roland H. C. Yap , Liang Dai

分类：机器学习

2022-11-27

A central problem in computational biophysics is protein structure prediction, i.e., finding the optimal folding of a given amino acid sequence. This problem has been studied in a classical abstract model, the HP model, where the protein is modeled as a sequence of H (hydrophobic) and P (polar) amino acids on a lattice. The objective is to find conformations maximizing H-H contacts. It is known that even in this reduced setting, the problem is intractable (NP-hard). In this work, we apply deep reinforcement learning (DRL) to the two-dimensional HP model. We can obtain the conformations of best known energies for benchmark HP sequences with lengths from 20 to 50. Our DRL is based on a deep Q-network (DQN). We find that a DQN based on long short-term memory (LSTM) architecture greatly enhances the RL learning ability and significantly improves the search process. DRL can sample the state space efficiently, without the need of manual heuristics. Experimentally we show that it can find multiple distinct best-known solutions per trial. This study demonstrates the effectiveness of deep reinforcement learning in the HP model for protein folding.

translated by 谷歌翻译